[Misc] Add 20 regression tests for 11 tool parser bug fixes by bbrowning · Pull Request #38172 · vllm-project/vllm

bbrowning · 2026-03-26T01:40:11Z

Purpose

Claude Code and I audited recent tool parser bug-fix PRs (Sept 2025 until now) and found that several landed without corresponding test coverage. This is purely additive test coverage to prevent regressions as we refactor, cleanup, and redesign some of these areas.

Mistral: fast detokenization text detection (PR Fix some Mistral parser issues #37209)
Qwen3Coder: malformed XML crash, anyOf double-encoding, speculative decode streaming (PRs [Bugfix] fix Qwen3.5 tool calling bug #36774, qwen3coder tool parser fix anyOf double encoded parameters #36032, [Tool Parser] Fix Qwen3Coder streaming parameter loss with speculative decode #35615)
DeepSeekV32: delimiter preservation with fast detokenization, skip_special_tokens adjustment (PR [Bugfix] Fix the issue where tool calling does not work when using fast detokenization with dsv32 #33964)
GLM-4 MoE: zero-argument tool calls, transformers 5.x delimiter handling, Unicode character preservation (PRs fix: avoid crash on zero-arg tool calls in glm4 parser #32321, Fix GLM-4.6v flash tool calling in transformers 5.x #31622, [Bugfix] Fix Unicode issues in GLM-4 tool calling #30920)
MiniMax M2: anyOf nullable parameter handling for non-null and null values (PR Fix optional parameter parsing in MiniMax M2 tool parser #32278 #32342)
Step3p5: MTP-style variable-chunk and multi-token streaming (PR [Bugfix] Fix step3p5 parser when using mtp #33690)
Kimi K2: native tool call ID extraction and multi-turn ID continuity (PR fix: preserve native tool call ID in multi-turn tool calling #32768)

Test Plan

pytest -sv tests/tool_parsers

Test Result

All the new tests passed, and all the old ones continue to pass.

510 passed, 1 skipped, 42 xfailed, 2 warnings

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request introduces a comprehensive suite of regression tests across multiple tool parsers, including DeepSeekV32, GLM4-MoE, KimiK2, MinimaxM2, Mistral, Qwen3, and Step3p5. These tests address various edge cases and potential issues such as delimiter preservation, skip_special_tokens logic, handling of zero-argument and malformed tool calls, Unicode character preservation, native tool call ID extraction, anyOf nullable parameter parsing, fast detokenization, and streaming behavior with multi-parameter and variable-sized chunks. A review comment highlights a malformed JSON string in a MinimaxM2 test, which needs to be corrected to ensure the test functions as intended.

tests/tool_parsers/test_minimax_m2_tool_parser.py

bbrowning · 2026-03-26T11:35:05Z

Gemini got confused counting JSON braces within the xml tags within the Python strings of the test, but I double-checked the test it highlighted just to be sure. And then gave it a thumbs-down for good measure, in case they use that to improve training.

sfeng33

LGTM, thank you for the thorough work!

mergify · 2026-03-30T06:12:49Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @bbrowning.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

aarnphm

stamp, given that @sfeng33 already review this.

aarnphm · 2026-03-30T20:43:40Z

@bbrowning there is a conflict merge here. canyou fix this?

Audited recent tool parser bug-fix PRs and found that several landed without corresponding test coverage. Added unit tests for each fix to prevent regressions. - Mistral: fast detokenization text detection (PR vllm-project#37209) - Qwen3Coder: malformed XML crash, anyOf double-encoding, speculative decode streaming (PRs vllm-project#36774, vllm-project#36032, vllm-project#35615) - DeepSeekV32: delimiter preservation with fast detokenization, skip_special_tokens adjustment (PR vllm-project#33964) - GLM-4 MoE: zero-argument tool calls, transformers 5.x delimiter handling, Unicode character preservation (PRs vllm-project#32321, vllm-project#31622, vllm-project#30920) - MiniMax M2: anyOf nullable parameter handling for non-null and null values (PR vllm-project#32342) - Step3p5: MTP-style variable-chunk and multi-token streaming (PR vllm-project#33690) - Kimi K2: native tool call ID extraction and multi-turn ID continuity (PR vllm-project#32768) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning · 2026-03-31T12:41:40Z

Rebasing this locally, it appears two of the tests that were previously passing are now failing. I'll investigate, but it looks like between when I opened this PR and now we may have regressed on two of these test cases already...

…llm-project#38189) After the refactor in vllm-project#38189 to use self.tools instead of request.tools, anyOf regression tests need to provide tools at parser construction time so the schema is available for type resolution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ben Browning <bbrownin@redhat.com>

bbrowning · 2026-03-31T13:50:43Z

Ok, rebased and force-pushed with the conflict fix as well as adjusting the tests to move to the new format where tools are passed when constructing the parser. That is what initially caused my local failures after fixing the conflict, so it wasn't a regression in our parsers after all but just me needing to update these tests after PR 38189 landed.

chaunceyjiang

Thanks~

…ject#38172) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

…ject#38172) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: rishitdholakia13 <rishit+github@cohere.com>

…ject#38172) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com> Signed-off-by: Rishi Puri <riship@nvidia.com>

…ject#38172) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

claude bot reviewed Mar 26, 2026

View reviewed changes

mergify bot added deepseek Related to DeepSeek models qwen Related to Qwen models tool-calling bug Something isn't working labels Mar 26, 2026

github-project-automation bot added this to Tool Calling Mar 26, 2026

gemini-code-assist bot reviewed Mar 26, 2026

View reviewed changes

tests/tool_parsers/test_minimax_m2_tool_parser.py Show resolved Hide resolved

bbrowning mentioned this pull request Mar 26, 2026

[RFC]: Consolidated tool call parser implementations by type (JSON, Python, XML, Harmony) #27661

Open

10 tasks

sfeng33 approved these changes Mar 26, 2026

View reviewed changes

DarkLight1337 requested a review from chaunceyjiang March 28, 2026 04:01

mergify bot added the needs-rebase label Mar 30, 2026

aarnphm approved these changes Mar 30, 2026

View reviewed changes

bbrowning force-pushed the tool-parser-regression-tests branch from 6b6a09c to 386548d Compare March 31, 2026 13:47

mergify bot removed the needs-rebase label Mar 31, 2026

aarnphm added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 31, 2026

Merge branch 'main' into tool-parser-regression-tests

71ffd2d

chaunceyjiang enabled auto-merge (squash) April 1, 2026 02:16

chaunceyjiang approved these changes Apr 1, 2026

View reviewed changes

chaunceyjiang merged commit cb0b443 into vllm-project:main Apr 1, 2026
14 checks passed

github-project-automation bot moved this to Done in Tool Calling Apr 1, 2026

bbrowning deleted the tool-parser-regression-tests branch April 1, 2026 11:27

yzong-rh pushed a commit to yzong-rh/vllm that referenced this pull request Apr 3, 2026

[Misc] Add 20 regression tests for 11 tool parser bug fixes (vllm-pro…

9933d11

…ject#38172) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026

[Misc] Add 20 regression tests for 11 tool parser bug fixes (vllm-pro…

cf1a449

…ject#38172) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

EricccYang pushed a commit to EricccYang/vllm that referenced this pull request Apr 10, 2026

[Misc] Add 20 regression tests for 11 tool parser bug fixes (vllm-pro…

cfe708a

…ject#38172) Signed-off-by: Ben Browning <bbrownin@redhat.com> Co-authored-by: Chauncey <chaunceyjiang@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Misc] Add 20 regression tests for 11 tool parser bug fixes#38172

[Misc] Add 20 regression tests for 11 tool parser bug fixes#38172
chaunceyjiang merged 3 commits intovllm-project:mainfrom
bbrowning:tool-parser-regression-tests

bbrowning commented Mar 26, 2026

Uh oh!

claude bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

bbrowning commented Mar 26, 2026

Uh oh!

sfeng33 left a comment

Uh oh!

mergify bot commented Mar 30, 2026

Uh oh!

aarnphm left a comment

Uh oh!

aarnphm commented Mar 30, 2026

Uh oh!

bbrowning commented Mar 31, 2026

Uh oh!

bbrowning commented Mar 31, 2026

Uh oh!

chaunceyjiang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

bbrowning commented Mar 26, 2026

Purpose

Test Plan

Test Result

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

bbrowning commented Mar 26, 2026

Uh oh!

sfeng33 left a comment

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Mar 30, 2026

Uh oh!

aarnphm left a comment

Choose a reason for hiding this comment

Uh oh!

aarnphm commented Mar 30, 2026

Uh oh!

bbrowning commented Mar 31, 2026

Uh oh!

bbrowning commented Mar 31, 2026

Uh oh!

chaunceyjiang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants